Microsoft SQL Server 2008 R2 : Monitoring Replication (part 2) - New and Improved Peer-to-Peer Replication

2/9/2012 9:18:51 AM

New and Improved Peer-to-Peer Replication

For data distribution requirements that must have updates in multiple nodes, peer-to-peer replication is ideal. You must worry about conflicts, but typically they are rare and very easily handled by a simple conflict handler. Peer-to-peer replication is transactional replication based, and can have any number of peers. A separate wizard is used to set up peer-to-peer replication because of its unique characteristics. Remember, there are essentially no subscribers, just publishers. All peers are handled by one or more distributors and are assigned a unique peer originator ID value to help in transactional consistency and conflict resolution. If you have been doing replication since the SQL Server 6.5 days, the peer-to-peer configuration topology viewer is much like that old user interface. Figure 4 shows the special invocation of peer-to-peer creation from SQL Server Management Studio.

Figure 4. Peer-to-peer wizard launch.

You basically create publications that become shared in the peer-to-peer topology. You add “nodes” on an equal hierarchical level that participate in the publication equally. You can add any number of peers (nodes), and new in SQL Server 2008, this can be done without interrupting the existing replication topology. Figure 5 shows the setup of a publication for a database named REPLTEST_PEER2PEER and the addition of two nodes to the topology.

Figure 5. Setup of the publication and two nodes in a peer-to-peer topology (1 thru 4).

Peers are typically initialized manually for consistency purposes. Figure 6 shows the Configure Peer-to-Peer Topology Wizard summary and the successful creation of the full (two-node) topology. Note that each node is assigned a unique originator ID, and conflict detection has been enabled. Notice also that it really doesn’t matter which node is listed first in the wizard topology because they are equal (no hierarchy exists) and the arrow is bidirectional.

Figure 6. Configure Peer-to-Peer Topology Wizard summary and complete topology.

The Performance Monitor

You can use Windows Performance Monitor to monitor the health of your replication scenario. When you install SQL Server, you get several new objects and counters in Performance Monitor:

SQLServer:Replication Agents— This object contains counters used to monitor the status of all replication agents, including the total number running.
SQLServer:Replication Dist— This object contains counters used to monitor the status of the distribution agents, including the latency and number of transactions transferred per second.
SQLServer:Replication Logreader— This object contains counters used to monitor the status of the log reader agent, including the latency and number of transactions transferred per second.
SQLServer:Replication Merge— This object contains counters used to monitor the status of the merge agents, including the number of transactions and number of conflicts per second.
SQLServer:Replication Snapshot— This object contains counters used to monitor the status of the snapshot agents, including the number of transactions per second.

As you can see in Figure 7 , we chose to monitor the typical things critical for replication: the LogReader counters, the distribution server counters, some default processor times, and disk queue lengths to keep an eye on load at the publisher. Figure 19.63 shows some spikes during log reader activity as big transactions hit the publisher. However, they drop off quickly, easily handling the large transaction volumes.

Figure 7. Performance Monitor counters for replication monitoring.

Replication in Heterogeneous Environments

SQL Server 2008 allows for transactional and snapshot replication of data into and out of environments other than SQL Server. This is termed heterogeneous replication. The easiest way to set up this replication is to use ODBC or OLE DB and create a push subscription to the subscriber. This is much easier to make work than you might imagine. SQL Server can publish to the following database types:

Microsoft Access
Oracle
Sybase
IBM DB2/AS400
IBM DB2/MVS

SQL Server can replicate data to any other type of database, provided that the following are true:

The driver must be ODBC Level 1 compliant.
The driver must be 32-bit, thread safe, and designed for the processor architecture on which the distribution process runs.
The driver must support transactions.
The driver and underlying database must support Data Definition Language (DDL).
The underlying database cannot be read-only.

Backup and Recovery in a Replication Configuration

A replication-oriented backup strategy will reap major benefits for you after you have implemented a data replication configuration. You must realize that the scope of data and what you must back up together have changed. In addition, you must be aware of the recovery time frame and plan your backup/recovery strategy accordingly. You might not have multiple hours available to you to recover an entire replication topology. You now have databases that are conceptually joined, and you might need to back them up together in one synchronized backup. Figure 8 shows overall backup strategies for the most common recovery needs.

Figure 8. Common backup strategies for different recovery needs.
Recovery Need	Backup Strategy
100% data, All sites, Small Recovery Window	Coordinated DB backups at all sites involved in the replication configuration (publisher, distributor and all subscribers). Somewhat complex to do.
100% data, All sites, Medium Recovery Window	Backup Publication DB and Distribution DB together. Replication can be recovered from this point very easily without reconfiguring anything. Just have to re-initialize the subscribers. This is the most common approach being used.
100% data, All sites Big Recovery Window	Backup of Publication DB only. Can then reconfigure replication via scrips and reinitialize distribution, and all subscribers fairly easily.

When backing up environments, you need to back up the following at each site:

Publisher (published database, msdb, and master)
Distributor (distribution database, msdb, and master)
Subscribers (subscriber database, optionally msdb, and master when pull subscriptions are being done)

You should always make copies of your replication scripts and keep them handy. At a minimum, you need to keep copies at the publisher and distributor and one more location, such as at one of your subscribers. You will use them for recovery someday.

You shouldn’t forget to back up master and msdb when any new replication object is created, updated, or deleted.

If you have allowed updating of subscribers using queued updates, you need to expand your backup capability to include these queues.

In general, you will find that even when you walk up and pull the plug on your distribution server, publication server, or any subscribers, automatic recovery works well to get you back online and replicating quickly, without human intervention.

Some Thoughts on Performance

From a performance point of view, the replication configuration defaults err on the side of optimal throughput. That’s the good news. The bad news is that everybody is different in some way, so you have to consider a bit of tuning of your replication configuration. In general, you can get your replication configuration working well by doing the following:

Keeping the amount of data to be replicated at any one point small by running agents continuously, instead of at long, scheduled intervals.
Setting a minimum amount of memory allocated to SQL Server by using the Min Server Memory option to guarantee ample memory across the board.
Using good disk drive physical separation rules, such as keeping the transaction log on a separate disk drive from the data portion. Your transaction log is much more heavily used when you opt for transactional replication.
Putting your snapshot working directory on a separate disk drive to minimize disk drive arm contention. You should use a separate snapshot folder for each publication.
Publishing only what you need. By selectively publishing only the minimum amount of data required, you implement a much more efficient replication configuration, which is faster overall.
Trying to run snapshots in nonpeak times so your network and production environments aren’t bogged down.
Minimizing transformation of data involved with replication.

Log Shipping

If you have a small need to create a read-only (ad hoc query/reporting) database environment that can tolerate a high degree of data latency, you might be a candidate for using log shipping. Log shipping is still a feature for SQL Server 2008, but it will be deprecated by the next release. In other words, it might be easy to use and easy to manage, but it is being phased out as a feature of SQL Server. For those who have current log shipping configurations, it is time to move to database mirroring. This transition will be easy because the two capabilities are so much alike. (Actually, many aspects of database mirroring came from log shipping.)

Data Replication and Database Mirroring for Fault Tolerance and High Availability

SQL Server 2008 allows you to use combinations of options to achieve higher availability levels. A prime example is combining data replication with database mirroring to provide maximum availability of data, scalability to users, and fault tolerance via failover at potentially each node in the replication topology. You can start with the publisher and distributor, making them both database mirror failover configurations. Building up a combination of both options together is the best of both worlds: the super low latency of database mirroring for fault tolerance and the high availability (and scalability) of data through replication.